Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[On-Device-Training] Upgrade Flatbuffers to Support 2GB+ Checkpoints. #19770

Merged
merged 17 commits into from
Mar 14, 2024

Conversation

AdamLouly
Copy link
Contributor

Description

Modifications to support 2GB+ checkpoint & Upgrading Flatbuffers

Motivation and Context

This PR includes changes that will make ort handle 2GB+ checkpoints.
To do that we need to upgrade flatbuffers to 23.5.9 - google/flatbuffers#7945

  • Modified the commitHash and the hash for the new version
  • Removed the patch for rust generator's unused variable warning as it is no longer producing this - Check it out here
  • Updated the VerifyField calls with alignment values that were introduced in the new version.

cmake/deps.txt Outdated Show resolved Hide resolved
@edgchen1
Copy link
Contributor

edgchen1 commented Mar 6, 2024

A few places that may need updating:

It is possible to use another flatc as well, e.g., from a separate installation. Note that ONNX Runtime uses
FlatBuffers 1.12.

#flatbuffers 1.11.0 does not have flatbuffers::IsOutRange, therefore we require 1.12.0+
FetchContent_Declare(
flatbuffers
URL ${DEP_URL_flatbuffers}
URL_HASH SHA1=${DEP_SHA1_flatbuffers}
PATCH_COMMAND ${ONNXRUNTIME_FLATBUFFERS_PATCH_COMMAND}
FIND_PACKAGE_ARGS 1.12.0...<2.0.0 NAMES Flatbuffers
)

"flatbuffers": "^1.12.0",

"flatbuffers": "^1.12.0",

@fs-eire would the onnxruntime-web flatbuffers dependency need to be updated as well?

@AdamLouly AdamLouly marked this pull request as ready for review March 6, 2024 19:47
@AdamLouly AdamLouly requested review from a team as code owners March 6, 2024 19:47
@AdamLouly AdamLouly requested a review from askhade March 6, 2024 19:48
@sumitsays
Copy link
Contributor

From DirectML perspective, it looks good to me.

@fs-eire
Copy link
Contributor

fs-eire commented Mar 8, 2024

ONNX Runtime Web CI failed. This is because this PR does partially upgrade for flatbuffer for ort-web.

Please either revert this change for ort-web, or perform a full upgrade

  • Use npm install flatbuffers@^24.3.7 to install so that packages-lock.json will be refreshed with dependencies version
  • Follow instructions in js/web/onnxjs/ort-schema/flatbuffers/README.md to generate new flatbuffer files for v24.3.7 and update the files in js/web/onnxjs/ort-schema/flatbuffers/

We don't necessarily need this change for ort-web because only WebGL is using this flatbuffer library as dependencies, and we do not have plan to support on device training for WebGL backend. For Wasm/CPU/WebGPU, this library is not needed.

@AdamLouly
Copy link
Contributor Author

ONNX Runtime Web CI failed. This is because this PR does partially upgrade for flatbuffer for ort-web.

Please either revert this change for ort-web, or perform a full upgrade

  • Use npm install flatbuffers@^24.3.7 to install so that packages-lock.json will be refreshed with dependencies version
  • Follow instructions in js/web/onnxjs/ort-schema/flatbuffers/README.md to generate new flatbuffer files for v24.3.7 and update the files in js/web/onnxjs/ort-schema/flatbuffers/

We don't necessarily need this change for ort-web because only WebGL is using this flatbuffer library as dependencies, and we do not have plan to support on device training for WebGL backend. For Wasm/CPU/WebGPU, this library is not needed.

Thanks for clarifying, in this case we can just revert ort-web upgrade.

snnn
snnn previously approved these changes Mar 14, 2024
Copy link
Member

@snnn snnn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you!

Copy link
Member

@snnn snnn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In file included from /mnt/vss/_work/1/s/include/onnxruntime/core/graph/graph.h:24, you need to add some guards to disable external warnings. Like what we do in https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/graph/onnx_protobuf.h . Otherwise the web pipelines won't pass

Copy link
Contributor

@pranavsharma pranavsharma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing for admin.

@AdamLouly
Copy link
Contributor Author

Thanks for the review everyone!

@AdamLouly AdamLouly merged commit 3255813 into main Mar 14, 2024
92 of 95 checks passed
@AdamLouly AdamLouly deleted the adamlouly/update_ckpt_flatbuffers branch March 14, 2024 23:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants